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A BSTRACT 

iJeterinlnat ion of the minimum resources required to pareo n ] angiinge 
generated, by a given contest free grammar is an intriguing and ygt unsolved 
p rob led. It seems plausible that any unambiguous context free p r au-- ; n r 
could be parsed in time proportional to the length, n, of each input string, 
LsrJy (2) Lias presented an algorithm which parses "many 11 gramma F£ in Clan 
proportional to n„ hut re-quires on sene, tils work In on extension of 
Knuth's (4) algorithm, which leads to a very efficient parse proportional 
to n of deterministic languages. Tills Memo, presents a different extent ton 
of Knuth's method. Knuthfs method fells ulien more than one alt emotive 
tius t he examined by a push down automaton making a Left to right pc a a < f f 
the input string, Early's ok tans ion takes all possible n t cenuit tve^ 

3 firuii tenuously without duplication of effort at any given step,, The 
method presented here continues through the string In order to gain 
information which will resolve the conflict in the ensuing right to left 
pass, which is made on the symbols accumulated on the stack of the autonaton. 
Tire algorithm is probably more efficient than Early's on certain ^r nmunrs ; 
it will fail completely on others. The essential idea may be interesting 
to tiros e attacking the general p rob lorn- 
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t. introduct!on 

I will assume that the reader is fan. 1 .!liar vLth context free languages 
and the notation given in Knutb's paper (4). 

The need Co design computer Languages mafces it important to under- 

etand idiich languages can be easily parsed. So general unambiguous context 

free language parsing algorithm has been proposed which does not require 

Z 

titne at least proportional to a , where the string to ba parsed is n symbols 
long. On the ether hand, w hove not yet been able to find an urtninhLguOus 
grammar which could not be pursed in tim* propertionul to it by some 
algorithm. The algorithm below bundles a. wide class of granqmnrs. 

The handle of a sentential form is defined as the loft most string 
of characters in the sentential form which equals the right side of some 
rule. Knuth hoe treated grammars where every handle cun he found by effll* 
sidering the characters to its Left, the characters in the handle, and 
some fined number for all handles, It, of characters to the right of the 
handleb Such a grammar can be parsed In titno proportional to n by a 
deterministic scantling the String from loft to fight. Fur example., 

the gramme r Cl: 


& 

A 

5 

-§■ cIS 

A 

■4 nAb 

A 

+ ab 

B 

-t aEbb 

B 

.4 ubb 


has the sentential forms 
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, n ...n 
2. n. a A m i 

3 + a n abb n 


cE 


5. ca n aEbbb 2B * 


b. ca^bbb ' -11 


where the handle has been underlined in each case* 

The handle in 6, is distinguished from the handle in 3. by the "c" 
at the left end of the string * It is not necessary to look at characters 
to the right of the handle * so h“0, and the grammar is a a Id to be TH{[}) , 

Otl the other hand, gramma r GL cannot be parsed in this tiki finer from the 
right , for one must look arbitrarily far ahead to find t lid "e ,f in order 
distinguish b. from 3, By similar argument* me eone luck: that a grammar 

which generates the language with strings e£ the form 

n r n.n n 
ca b b a 

n, n, 2n n „ 

abb a (I 

n, 2n n n 

ca b bad 

n. 2n. 2n n 
abba 

can be parsed from neither the left nor the right* In this mi'ino, wo extend 
Knuth's method to handle languages of this type* 


Knuth thinks of the parser as a finite sjt^te machine vir.li u push 
down stack. At each step the parser aiust either add an input symbol to its 
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stack cr L'.Yiist reduce the stack by recognising Chet the symbols at the top 
of the stack correepond to a handle and replace t ho $ e synbols on the -Stark 
by the left side o-f the corresponding rule. In general t this wi ll require 
information about characters to the left* such as the H c" In osample Gl< 
Kiuith showed that all the useful information about characters > ot redacti ans 
already made, to the left can he represented by one of □ finite number at 
a ta tee. 

The state is stored on the stack after each symbol which is added to 
the stack* The new state is computed from the old state on the top of the 
Stack and the symbol added. When the stack Is reduced state symbols lor 
the symbols in the handle are thrown away* 

The method gets into trouble when it is not passible to determine a. 
unique handle with the allowable 1 n format i an f For example., parsing 01 from 
the right b we wen Id reach the point flhhb n s the handle con Id be citlier ah 
nr abb* Early (2) allows his parser to explore both possibilities by main¬ 
taining a two-dimensional array Instead of j linear stack. The method 
used hero will lift to let the finite State language vilielt SumnurtirfS what 
has been found thus far become conceptually non-detenafiitstic ■ We will take 
a pilth for each of the possibilities which could arise if the stack w a$ 
reduced. The stack will not be reduced so that no information will be lost.. 
Without reducing the stack we cannot see what st.ito symbol would have 
ended up on top of the stack after the reduction s *o more infnrmntion will 
hnuL' to bu brought along by the non-doterrainistie finite state - summar! sing 


i 
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language. Even though we conceptually take Severa 1 paths s wo often constrain 
the possible parsings for the rest of the string to the right, Sometimes 00 
the parser moves farther to cite right > input symbols will bo found which 
show that some of the paths cannot apply to the string. The stack can be 
reduced whenever all current paths indicate the same reduction. Once the 
right end of the string has been reached thfr Stdik Is parsed from the 
right using Knuth's algorithm; except that the State symbols from the left 
to right pass are used to restrict the possibilities which the algorithm 
c on si de rs . 


II. De scription of the Algorithm 

We follow Knuth in the description, except that production* of the 

form A.»g will not be treated since they add even more complexity. We 

define the algorithm to be l,R{hl) and Rh(H2) for Llie hi and k2 used below. 

First i let II , Ctj> be the set of all kl-letter strings (1 ever T ; (jj 
^ ■" th 

such that c» P a s f Q * some Of, Next* sxippose the p- 1 production of tie 


gramma r ts be pa rsed ha* the f o m A-|X, . , ,X .A state 

in 

^ h * 

represents the first j Letters of the p““ production. 


LPpj;0,m,t^ 


OjJj.,-, 0 ^ and a string 


a which could follow the left side of the p“ product!™ in some sen¬ 


tential form* We also introduce O virtua l stat e, (pj jiQf,m,t) , Virtual 
states arc interpreted the same as statesJ, but the algorithm uses them 
differently, m is any Integer identifying this state or virtual state from 
all others created during the parse and t is a list of integers identifying 
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other states or vitual states. During the t renal at Ion process lor raeintsin 
;j stack* denoted by 



to 


The portion to On,- Left e£ die vertical line consists alternately of state 
sots and characters; this, is tl«! portion of the string which hiis been con¬ 
sidered by the Left to eight pass. The Stsitc sets contain both regular 
and virtual states, Xho steps for die left to tight pass are given below. 
We start the stack with k2 ^'s and then enter tE*e following loop* with 
n“0 and = ^0,Q; e , L,0_|]- At each stop* assume the stack eon tents are 
as shown in (1) and E^* 


gtan 1. Compute the "clajure 0 8 of S* Which is defined recursively as the 
smallest set satisfying the feHewing equation; 



there exists (psj iSiin 1 ,t) formed hy the previous application eL" Step 3, 



(Wc thus have added to S all productions that ml^ht apply in addition to 
those we are al ready working on.) 


Step __% ± Compute the following acts of kl letter strings; 



p in K (X X 0)1 

k. P(J+i> pn p 
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Z 

P 


[Of|[ P ,n ;a] In S ] the number of productions 


7 . ; hh 2 itL all he disjoint or the grammer ia not and virtual 

states must be frimed, a) l£ lies in 7- and not In any Z^, Bhfft 

the stack left: 

f k V l V--VilV- I ki l '* kl 

and rename i.ts contents by lotting X ^ * Y i 3 Y i = * r r " * nil<1 1 ^ 

Stop 3, b) If Y |*+" Y kj lies in ojtaetly one Z^ and not in Z, then the 

characters forming the right Side of production p are on the top of the 

stack . Check to sot if state sets $' f ..■.■£ contain any states 

d-Op-l} n-1 

i for which j*n^. If this is not so, the stack can he reduced. Let 

r=n-n ; the stack tu 9 w contains t ^ 1+ + + X ,S , replace this string by A to obtain 

p ’ r+1 n a P 

^. . .X S A |Yj, ♦ “ end Let n=r» ■ A^, Nov? ge to Step 3. 

Otherwise f shift the stack Left as in a) above and. go to Stop 5. 


St ep 3 . Hie stack now has- the form 


VA"VA«^rV 

J 

Let S^ +J “ [[p p j+-iiQ£,ir ,tl| |.p,J ;a,r,t j in j<sii 

and K r+1 = X p £J421 } 




A 

U[(p,j+};LT > ni > t) | (p,jEa,r,t> In j<n p * 

Qn<f \fl. = Vi+i) J 

U[tP»i +1 E a « m » t 5 f <P ? j;Cf-r T t) or in £^, 

and (q ,n^; 0 *11 *(» .>:<<>)) or jj[ p n ; 0 , u 9 (* * , IT > , < iTl S a+0 

U{<p,J iDE,«D,0| in $ 3)J 

and r i j |3 > W j {+ + +tti + * * J J or j_q , i £p , u ^^ ■ « ■ m■ ■ ■ 1 | in ^ | | | ^ ^ ^ 

Sow form S J from S ^ by (a) merging all States ;_p h J jtt «m ,t j, 

j_p ,J ;Q„n,E j by furming a state [p, j jCJjTri, s Lft] and changing the references to n 
Co stab? si* end (h) doing the same £*r all E^th pairs of virtual states. 

I Indicate the merge as a separate step because it is thus easier to 

explain. If the first character to the right of the bur is ’V"s begin the 

tight to left pass through the stack, otherwise, go to Stop 1+ 

The right to left pass is done in the saw ma cleh -t as the loft to 
right pass with three except ions. First* no virtual states iue formed; if 
the algorithm reaches a point where a virtu at state would be formed., it 
reports failure* Second, since ve ere going right to left, it is necessary 
to sequence through the characters of the right-hand, side of a production 
in the opposite direction. 'When a new state is ferra^d by $t^p 1* the form 
Is q , n ,.m,(ml) |, Then, in Step 3 P J is decremented instead of fne reuse n toil.' 

•r - - . 1 . ' k ‘ . 
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Third, after forming the state S by Step 1, called 5 In the example, 

Tl Ll 

& is replaced 3 by I formed as follows* het S be the le ft-to- right state 

tl El ’ll 

set At the top of the input. 

s fl “ {[p»j;a]| b>isJ ls in & n 

and b*j-l,p] is in SJ 


Hi. Example 

The grammar 


0. 

S 

* s# 

u 

s 

.* cEaA 

2, 

s 

cAA 

y. 

s 

-* E;tE 

4* 

a 

* A3 

5. 

A 

.+ ;iAh 

& + 

A 

,4 ah 

7, 

B 

t aBbb 

8. 

B 

.+ abb 


produces the strings 


. il Zn n+l.P 

1 + ea t> a o 

„ tu n n, n 

2. ea b a b 

3 . ."tW" 

4. B V a V" 


We take kl*l and k2=0. 



The major steps in parsing a string of the first type a re filwwti OW 

the following, pages. The carO production la not shown in the s L n c i: 5et a + 

Since the stack can expand and contract, the state sots have boon gluon 
two subscripts. The second is the subscript used in Section El* The first 
connts the number of times that the stock tips reached that li'ilgtli + 

In the notation for states or vitual states, the set t 1ms hern 

omitted when it is empty. If C is a set of one element, that element Is 


shown. 
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Steps tr> Piirs-e String 1. 

. n k 2n n+1. n 
^001 £ii b * b # 

n i n. 2 ti ntl.n 
^o.B^o.il * b a b # 

+0,0 cS 0.i aS 0,2l a b * b # 

0 s o,o cS o J i‘ iS o,2'" aS o,>H.il 

* S «,f S 0, l <lS (.,i- aS 0, n+ l l ’ S 0,„«l 

#''’0 > o‘ ;S 0,l" S tl,2'- "■ ,S 0,n+l bS 0,n+2'"’’ S 0,3^+[l '^’’V 

^0,0 cS O, l aS 0,2 ■ ■ ■' aS 0 ,n+l bS 0 . n+Z ” - bS 0 , 3n+l aS 0>5n+21 ^ 

, O cS 0, l aS 0 a " '' HS 0 *n+l bS 0 ,n+2''' bS Q s 3n+l aS 0,3n+2''' a 3 0*4n+l I 
**0,0*0, l aS 0,2 ‘* ' aS 0 ,n+t bf? 0 ,o+Z * + ' bS 0.3o+l &S 0,3n+2 ‘ r ' aS Q T 4n+1 h5 0 .4n+Z I h ‘ 
^0 ” aE 0,ii41 bS 0,n+2”* bS 0 1 3i3+l aS 0 1 3i5+2' ' '° S 0*4n AS 1 p 4n+l I b ^ 


n-1, 


^0 s 0 C ^0, L aS 0 ,2* * + aS 0,n+l bS Cl,n+2 * ‘ + ' b5 0 , Jn+L aS 0,3n+2' ’ ’ a5 Q.4n AS l.^Li +l'^] ,4n+2 f b 

* & Q P 0* & 0,1 i& 0 p 2 r *■' aS fl*n+l bS 0 * o+2 * + + bS Q s Sn+J/^O,3fl+2 AS 1*3n+3 


n-2 


1.0.ir 

l f 0 y 
*O p Ay 
IVA.i-V 

*0*3*® 0*2*0,1**0,0* 
®0*2n+2 b * H ’Vs^O ,2 aS Q,1 A ^0,0* 

; 0 ■ ln+3^2tt+2^0^2n+^' +1 *0 * 3 b? 0 * 2* S 0 P 1 A ^0, O* 
I" 1 * 2n+l E ^0,2ti b ‘ ‘ ■ *0 * 3 b ^ 0 * 2 H ^ 0 p 1 A3 0, 0* 
S U2n+2 e *l t 2n+l B ^U.2xi^0>2n-] b * ’ *% * 3 b *0 * 2 ^ 0 1 1 A ® 0 * C* 


0,0 C 3 0 s 1 a& 0 p 2 + + h a3 0 *n+l bS 0 *n+2 - " b3 0,3n+l flS 0, Sn+Z^ 13*1+3 
#S 0 p 0 cS 0 ; l a3 0,2 _ " aS 0 *0*1^0*11+2 ■^ bS 0 1 3«+L* a Cip3n+2 
^ 3 0pa cS 0,] fl3 0,2' ' ■ aS 0 J n+I bS 0*fi+2" ' ,bS 0*3n-H 

^ S 0 ,Q cS 0 ,l* a 0 , 2 ”' aS 0 ,n+I bS 0 *n+ 2 ' ' ,bS 0 * 3 n 
# 3 0,0 cS Q,l aS O,2‘ " aS 0,n+l 

^ 3 G f Q c3 0pl :j3 Op2‘ ? ' ,3S Qpn, 

^pO c& Opl a& 0 p 2- aS O p n 

4 G o,o cB o,i aS i»,a’**“ s o,n-i 

#Vo 

^ 0*0 


S 1*3 BS 0,2 aS 0,1 AS 0,0^ 

« U 4 J 1 .AA.A ^ 

^0,0* 
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ReprescmCitivC Stflt* S^ts to. Parse String 1. 


0,0 


ai.Oif'L] l2,O i# *23 [3*0i#*-3J [A,0;#,4| [5,0;a,5*4] [6 P 0ja.b.4J 

c 7 s O;^ s ?>33 10,033,8,3]) 


0.1 


([L,L;#,l] E2,l|f,z] [5,0^3,11,2] [0,0;fl,]2>2l IT ,0;si p l.T, 1) 

[B ,034,14, 1]) 


0,2 


Vti+1 


({14,#4) (211 *■# i 2) [ 5 $ 1 , A * 1112] [ | 6 ,135, 12 , 2 ] [.‘, 134 , 13 , 1 ] 

[0,l;a P l4,19] [5,0;b,19*ll] l6,0;b*2Q*ll] [7.Ojb ,21,13] 

[3*0^,22,13]) 

(U*l;#*l> C^pLi#,2> (5*1^45,2) <7*1 ;b*17*1) (5,1 ;h ,23, ( U ,23)) 

{?4;b*24,(17*24)) 134^,25*23] [&*l;b,2&,23] [7,1-^27,24] 

F8,l;b,23,24| [3*0;b*29,25] E6*0;b,30,Z5] I?,0;b,31,27] [&,0;h*32 ,27]> 


G*n+2 


{(l,l;f,l) (24;#4) (5»l;a*15*2) (7,l ia ,17,l) (54 ;b*23*(l5 ,23>) 

(7,1 jb *24 *(17*24)) |6*2;b*26*23] [3,2 S b,23,24] (5,2*b*35*(15*23)) 


0 ,n+3 


UlplE#4) (2,1;#*2) (5*l;a*15 J 2) (7,133,17*1) (5 *1 }t> ,23* (15,23)) 

(7,l;b,24,<i7,24» [&,3^,23,24] (5,3;b,36,(15,23)) (5 4;fe ,37 4IS ,23)) 
(7,2;b,38*(l7,ZA)> 


0»n44 


(0 t 4;#4) U*li#p2) (5 ,1 [3,15,2) (7 *l;iS*17,l) (5 ,1 t b ,23 , (15 *23» 

(7*1;b,24,(17,24)) (5,2 f a,42,9) (5*2;b t 41*(15*23)> 

(5 ,3;b ,40, (15,23)) (7,3 ;b*39 *(17 *24» 


0 *n+5 


(4*1 

(7,1 

(7,2 


#*1) (24*#,2> (5*I;a*l5*2) (7*l*a*17*l) (5 45^,23,(15 ,23)) 

b,24,(17,24)) (5,3;a*47,9) £5*3;b*4&,(l5 ,13)) (7*2^45*1) 
b J 44*(l7*24» (7*4;b,41*(17,24)) 



Nifl D 
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S 0*3fi+l 


(U*l3#*l> <2*1;4,2) <5,l:a,U,2) (5,2js,50,2) (5,I;b,23,(15,23)) 

<5,2;b,52*<15.23» (5,3:b,53,(15,23)) [5,0:4,62,63] [6*0;4*63*61] 

< 1*2:#*6Q) <2*2:4*61) (7,1:3,17,1) (7 S 2;&,54,1) (7*3;a,55,l) 

<7*4;a*5**I> (7,I;b,24,(17,24) (7,2^,57,(17,24)) 0 >3 ;b ,55 ,(1?,24» 

(7 > 4ib>59>(17 > 24)<5,3ja,51,9)) 


0,3n+2 


((l* 3 ;f* 66 ) [ 5 , 1 ^, 62 , 61 ] E 6 ,l;f , 63,611 [ 5 , 0 ;b M , 62 ] ( 6 , 0 : 6,65 , 62 ] 
[5,0;^,6J,66] [6,0;^ h 6S,6&I (2,2:#,6l» 


0,3m+3 


((1,3:4*66) (2,2;*j6l) (5,1*4,62,61) [5,1:4,67,66] ( 6 , 1 : 4 ,G.M&) 
[5*l;b*64*62] [6*1;b*65*62) [5,0;b,69,(67,64)] [6 ,0;b*7Q*(67 ,64) 1) 


S 


0,4n+l 


((l*3;f*66) (2*2:4*61) £**1:#*&2,(61*66)) (5,1 ;b ,64>(62*64)) 
[5,1:6*69*64] [6,l}b,70,64j [5,0:6,71,69] (6,0:6,72,60]) 


0,4n+2 


((1*3;#.6b) (2*2:4*61) <5*1^,62,(61,65)) (5,1:6,64,(62,64)) 
( 5 * 2 ;b r 73 ,( 62 , 64 )) [ 6 , 2 ;b* 70 * 64 ]) 


1 , 4 n+l 


(U,3;#*66) (2,2:4*61) (5,1:4,62,(61,66)) (5,1fb,64 ,(62,64)) 
(5,2;b,75*(62,64» [5*2;b,69,64)) 


1 * 4 c +2 


((1,334,66) (2,214,61) (5,1*4,62,(61,66)) (5.1:6,64,(62,64)) 
(5,3:b*76*(62,64)) [5*3:b,69,64| (5,2 ;b ,75,(62 ,64)) ) 


S 


1 ,3n+3 

0,0 

0,0 

0,1 


= ((M; 4 , 66 ) (2 ,2 34 , 61 ) [ 5 , 2 : 4 , 62 , 61 ]) 


= ([l,5; e ] [2*4: e ] [3,4; e ) [4,3 Te J [5,4; e J E6,3: e | (7,5;,] tS*4; c ]> 


- <ti, 5 ; e ]) 


(IM: S ]) 


iicrt ucfl u<r.- urn n«5 iutj np) 
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0,2 

0,3 

0,2ri+3 

l *2n+l 

1,2n+2 

1,3 

r i>4 


= CU,3; e 1 

- ([?*4; e ]> 

- (f & * 2; e ]) 

- < C&pi 5 e]> 

- ([?p 3; s ]> 

- UM; e ]> 

- (El,2; e |) 

= ([ 1 ji>el) 
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IV Pi ecu s Eitin 

First j note that the non-doterministit language is indeed finite state. 
Since there ere -a finite number of roles in the graiwcr find each ti,ia n 
finite nunber of characters on its tight side* there ere only a finite 
number of states of the form (p □ j ;CfjTii*t) or tp > j ;Lt>m^t ] ■which have distinct p • .1 
and Q. Therefore t the set t must be finite and there arc also only a 
finite number of possible state sets. 

Since the method of constructing the Elate set? la Quite complex, one 
might thi.«|k that the algorithm cannot be very fast* This is not so. The 
state sets need to constructed only encCi then each state set and string 
of k input chnrcters determines a now state set* Thus, the important 
thing la the number of state sets „ artd whether tikis number can be reduced 
for the grammar in question. 

In the example* a character at the left end of tlhf String, to be parsed 
gives information needed to make a reduction at the right end ( Only after 
this reduction is fPtide can Information needed to make further reductions 
at the Left end be Obtained* The example could he expanded so that it could 
be parsed only ££ the same type of algorithm made three passes through 
the stein#* or so that the algorithm would have to make an arbitrarily 
large number of passes* 

" ■ ■ . 1 4 

If iht algorithm succeeds after some number of passes, ft means that 
the information needed to find the correct reduct i OflS could be obtained 
without having to unwind the pushdown slack in more than one way. Efficiency 
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is Ibat because the algorithm which makes 4 complete p-iss in one direction 
and then in the other may h*ve to consider some cha rector & several times*. 
Tills could pc tha^S bo omitted If the trouble spots were somehow remembered 
on the first pass. 

The language with strings of the form n^b^ 11 end s^b 11 cannot bo pa rac'd 
by this algorithm* The 3*4 must be counted against the b r s in both ways. 

On the other hand s the language can still be parsed in timo proportional 

to n by Early’s scheme* Hy algorithm will fail on a palindrom language 

■ ■ - Z 

find Early's will require effort proportional to n . A paiindroin language 
ean be efficiently parsed both ends toward the middle. 

It should be possible to find more generaI methods of parsing in time 
proportional to the string length. 
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